Increasing performance with multiply-add units and wide buses

نویسندگان

  • David López
  • Mateo Valero
  • Josep Llosa
  • Eduard Ayguadé
چکیده

A balanced increase of memory bandwidth and computational performance is one of the current trends towards high performance microprocessors. This improvement can be attained either by replicating resources such as buses and functional units or by making them more complex. For example, some microprocessors, as the IBM’s POWER2 double the width of the buses between the register file and the first-level data cache in order to get similar results as by doubling the number of buses, but at a lower cost. In a similar way, some microprocessors have multiply-add fused functional units to increase the computation capability, as IBM’s POWER2 and RS6000 processors. In this paper we evaluate the performance and the effects on register pressure of these alternatives. The performance benefits have been evaluated using 1180 kernel loops of the Perfect Club benchmarks, which account for 78% of the total execution time. The results show that both techniques (widening buses and using multiply-add fused functional units) are complementary cost-effective solutions to increase the processor efficiency in numerical applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ضرب‌کننده و ضرب‌جمع‌کننده پیمانه 2n+1 برای پردازنده سیگنال دیجیتال

Nowadays, digital signal processors (DSPs) are appropriate choices for real-time image and video processing in embedded multimedia applications not only due to their superior signal processing performance, but also of the high levels of integration and very low-power consumption. Filtering which consists of multiple addition and multiplication operations, is one of the most fundamental operatio...

متن کامل

A Novel Multiply-Accumulator Unit Bus Encoding Architecture for Image Processing Applications

In the CMOS circuit power dissipation is a major concern for VLSI functional units. With shrinking feature size, increased frequency and power dissipation on the data bus have become the most important factor compared to other parts of the functional units. One of the most important functional units in any processor is the Multiply-Accumulator unit (MAC). The current work focuses on the develop...

متن کامل

Impact on Performance of Fused Multiply-Add Units in Aggressive VLIW Architectures

Loops are the main time consuming part of programs based on floating point computations. The performance of the loops is limited either by recurrences in the computation or by the resources offered by the architecture. Several general-purpose superscalar microprocessors have been implemented with multiply-add fused floating-point units, that reduces the latency of the combined operation and the...

متن کامل

Parameterized Function Evaluation for FPGAs

This paper presents parameterized module-generators for pipelined function evaluation using lookup tables, adders, shifters and multipliers. We discuss trade-offs involved between (1) full-lookup tables, (2) bipartite (lookup-add) units, (3) lookup-multiply units, and (4) shift-and-add based CORDIC units. For lookup-multiply units we provide equations estimating approximation errors and roundin...

متن کامل

Floating-Point Single-Precision Fused Multiplier-adder Unit on FPGA

The fused multiply-add operation improves many calculations and therefore is already available in some generalpurpose processors, like the Itanium. The optimization of units dedicated to execute the multiply-add operation is therefore crucial to achieve optimal performance when running the overlying applications. In this paper, we present a single-precision floating-point fused multiply-add opt...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997